NYC Shelters, Are We All in This Together?#
As housing prices increase, low-income, vulnerable populations experience an increased risk of homelessness and may find themselves in one of the city’s shelters. While some boroughs are more expensive than others, and thus contribute more to unaffordable housing, it is important for each borough to shoulder the burden equally so as to not overwhelm any given shelter system within city. But is this currently the reality?
Using data from the Department of Homeless Services (DHS) and the Department of City Planning (DCP), I examined which borough(s) had the most shelters and individuals in their shelter system. I then looked at population data to determine how many shelters per capita each borough had. From this, I isolated one borough to examine the distribution across community districts.
I wanted to know which borough, if any, had the most shelters/individuals in their shelter system and if this distribution was equal across community districts. I hypothesize that either the Bronx or Brooklyn would have the most shelters/individuals in their shelter system due to New York’s history with pushing locally unwanted land uses (LULUs) to those boroughs. I also hypothesize that there will be a discrepancy in the shelter distribution among community districts.
As you will see, my hypotheses were partially proven right. The Bronx and Brooklyn were among the highest in terms of shelters and individuals; however, Queens had the most individuals in their shelters system. The Bronx, however, had the most shelters per capita and had an uneven distribution of shelters across its community districts. Exploring why that may be is an area of future study.
About the Data#
Three datasets were used in this analysis:
The first dataset is Buildings by Borough and Community District. It displays the total number of shelter buildings by borough and community district. There are seven types of shelters listed within the dataset: Adult Family Commercial Hotel, Adult Family Shelter, Adult Shelter, Adult Shelter Commercial Hotel, Family Cluster, Family with Children Commercial Hotel, and Family with Children Shelter.
The second dataset is Individual Census by Borough, Community District, and Facility Type. It presents the number of individuals for each shelter facility type by borough and community district. It is broken down by the same seven types of shelters and includes counts from different days. For this analysis, the most recent count (March 31, 2024) was used.
The third dataset is New York City Population by Borough, 1950 - 2040. It provides the unadjusted decennial census data from 1950-2000 and projected figures from 2010-2040.
# Step 1: import important packages for data analysis
import pandas as pd
import plotly.express as px
import numpy as np
import matplotlib.pyplot as plt
# total number of shelter buildings by borough and community district
shelters = pd.read_csv("Buildings_by_Borough_and_Community_District.csv")
# shelters.head()
# number of individuals for each shelter facility type by borough and community district
individuals = pd.read_csv("Individual_Census_by_Borough__Community_District__and_Facility_Type.csv")
# individuals.head()
population = pd.read_csv("New_York_City_Population_by_Borough.csv")
# population
New York City Shelters: Brooklyn Takes the Lead#
In order to begin my analysis, I first needed to change ‘NaN’ values to 0. This would allow me to get a count of the total number of shelters per community district without errors. Once a ‘total_cd_shelters’ column was made by adding all shelter types together across the row, I grouped the dataframe by borough and summed all community district shelters in the district together. The new total is represented in the ‘shelter_count’ column. I then made the boroughs the index (first column) of the dataframe and removed the row for Westchester, since it is not a borough but was included in the data.
# Change NaN values to 0
shelters = shelters.fillna(0)
# shelters
# Add column to get total shelter counts by community district
shelters['total_cd_shelters'] = shelters['Adult Family Comm Hotel'] + shelters['Adult Family Shelter'] + shelters['Adult Shelter'] + shelters['Adult Shelter Comm Hotel'] + shelters['FWC Cluster'] + shelters['FWC Comm Hotel'] + shelters['FWC Shelter']
# shelters.head()
# Group by and generate sum of shelter buildings by borough
borough_shelters = shelters.groupby(['Borough'])['total_cd_shelters'].sum().to_frame(name="shelter_count")
borough_shelters = borough_shelters[:-1]
borough_shelters
| shelter_count | |
|---|---|
| Borough | |
| Bronx | 132.0 |
| Brooklyn | 148.0 |
| Manhattan | 94.0 |
| Queens | 116.0 |
| Staten Island | 8.0 |
Brooklyn has the most shelters of all boroughs with 148, and the Bronx has the second most with 132. Staten Island has the least with just 8 shelters within the borough and, surprisingly to me, Manhattan has the second least with just 94.
To help visualize this discrepancy, I plotted the counts in a histogram with discrete coloring that will be consistent throughout the analysis.
color_discrete_map = {'Bronx': '#D00000', 'Brooklyn': '#FFBA08','Manhattan': '#3F88C5', 'Queens': '#032B43', 'Staten Island': '#136F63'}
fig = px.histogram(borough_shelters,
x=borough_shelters.index,
y='shelter_count',
color = borough_shelters.index,
color_discrete_map=color_discrete_map,
title='Total Shelters by Borough').update_layout(
yaxis_title="# of Shelters"
)
fig.update_layout(
margin=dict(l=20, r=20, t=50, b=20),
showlegend=False,
)
fig.show()
Individuals in City Shelters: Queens Reigns Supreme#
I completed the same steps as above, but this time used individuals recorded in the shelter system on March 31, 2024.
# Change NaN values to 0
individuals = individuals.fillna(0)
# individuals
# Add column to get total individual counts by community district
individuals['total_cd_individuals'] = individuals['Adult Family Commercial Hotel'] + individuals['Adult Family Shelter'] + individuals['Adult Shelter'] + individuals['Adult Shelter Commercial Hotel'] + individuals['Family Cluster'] + individuals['Family with Children Commercial Hotel'] + individuals['Family with Children Shelter']
# individuals.head()
# Group by and generate sum of individuals in shelters by borough
borough_individuals = individuals.groupby(['Borough'])['total_cd_individuals'].sum().to_frame(name="individual_count")
borough_individuals = borough_individuals[:-1]
borough_individuals
| individual_count | |
|---|---|
| Borough | |
| Bronx | 20009.0 |
| Brooklyn | 22537.0 |
| Manhattan | 15739.0 |
| Queens | 25667.0 |
| Staten Island | 1184.0 |
Queens had the most individuals in its shelter system at 25,667. Brooklyn was closely behind with 22,537. Staten Island and Manhattan had the least with 1,184 and 15,739 individuals, respectively.
To visualize this I, again, plotted a histogram with using the same color scheme as before.
fig = px.histogram(borough_individuals,
x=borough_individuals.index,
y='individual_count',
color = borough_individuals.index,
color_discrete_map=color_discrete_map,
title='Individuals in Shelters by Borough').update_layout(
yaxis_title="# of Individuals in Shelters"
)
fig.update_layout(
margin=dict(l=20, r=20, t=50, b=10),
showlegend=False,
)
fig.show()
Average Individuals per Shelter: Queens Remains on Top#
To get a clearer picture of the capacity in each borough’s shelter, I merged the shelter and individual data into one dataframe. I then created a new column where I calculated the average number of individuals per shelter.
borough_merge = pd.merge(left=borough_shelters, right=borough_individuals, left_on="Borough", right_on="Borough")
# borough_merge
#calculate average individuals per shelter and round to the nearest integer
borough_merge["shelter_avg"] = round(borough_merge["individual_count"]/borough_merge["shelter_count"])
borough_merge
| shelter_count | individual_count | shelter_avg | |
|---|---|---|---|
| Borough | |||
| Bronx | 132.0 | 20009.0 | 152.0 |
| Brooklyn | 148.0 | 22537.0 | 152.0 |
| Manhattan | 94.0 | 15739.0 | 167.0 |
| Queens | 116.0 | 25667.0 | 221.0 |
| Staten Island | 8.0 | 1184.0 | 148.0 |
Queens was still in the lead with an average of 221 individuals per shelter. Surprisingly, Manhattan was second place with an average of 167 individuals per shelter. So, while Manhattan has one of the least amounts of shelters and individuals in shelters, it operates its shelters with a high capacity.
Staten Island is again in last place with 148 individuals per shelter, while Brooklyn and the Bronx are tied with 152 individuals per shelter.
Below is a visualization of the data using the same techniques as before.
fig = px.histogram(borough_merge,
x=borough_merge.index,
y='shelter_avg',
color = borough_merge.index,
color_discrete_map=color_discrete_map,
title='Average Number of Individuals per Shelter by Borough').update_layout(
yaxis_title="Avg. Individuals in Shelters"
)
fig.update_layout(
margin=dict(l=20, r=20, t=50, b=10),
showlegend=False,
)
fig.show()
Compared to the other boroughs, Queens stands out. This may display a capacity issue is present in the borough.
Shelters per Capita: The Bronx Pulls Ahead#
To get a better understanding of how fairly shelters are distributed across the borough, I looked at population data. The data had to be cleaned prior to analysis, which included keeping the two columns I wanted to analyze (‘Borough’ for merging purposes and 2020 population data) and deleting the rest, removing extra white space from the boroughs and making the boroughs the index of the dataframe.
columns = [
"Borough",
"2020",
]
clean_pop = population[columns]
clean_pop = clean_pop.drop(0).reset_index().drop('index', axis=1)
# clean_pop
clean_pop['Borough'].unique()
array([' Bronx', ' Brooklyn', ' Manhattan', ' Queens',
' Staten Island'], dtype=object)
#remove extra white space from population dataframe
def whitespace_remover(dataframe):
# iterating over the columns
for i in dataframe.columns:
# checking datatype of each columns
if dataframe[i].dtype == 'object':
# applying strip function on column
dataframe[i] = dataframe[i].map(str.strip)
else:
# if condn. is False then it will do nothing.
pass
whitespace_remover(clean_pop)
clean_pop = clean_pop.set_index(["Borough"])
After cleaning the data, I merge the population dataframe to the shelters dataframe. I then calculated the shelters per capita by dividing the number of shelters by the population per borough.
borough_pop = pd.merge(left=borough_shelters, right=clean_pop, left_on="Borough", right_on="Borough")
borough_pop["shelters_per_capita"] = borough_pop["shelter_count"]/borough_pop["2020"]
borough_pop
| shelter_count | 2020 | shelters_per_capita | |
|---|---|---|---|
| Borough | |||
| Bronx | 132.0 | 1446788 | 0.000091 |
| Brooklyn | 148.0 | 2648452 | 0.000056 |
| Manhattan | 94.0 | 1638281 | 0.000057 |
| Queens | 116.0 | 2330295 | 0.000050 |
| Staten Island | 8.0 | 487155 | 0.000016 |
Looking at the table, it can be seen that the Bronx has the highest per capita rate of the boroughs. Staten Island remains in last place with Queens coming before it. Manhattan and Brooklyn are second and third, respectively, having similar rates as one another.
I visualized this in a histogram to better see the distinction.
fig = px.histogram(borough_pop,
x= borough_pop.index,
y='shelters_per_capita',
color = borough_pop.index,
color_discrete_map=color_discrete_map,
title='Number of Shelters per Capita by Borough').update_layout(
yaxis_title="Shelters per Capita"
)
fig.update_layout(
margin=dict(l=20, r=20, t=50, b=10),
showlegend=False,
)
fig.show()
The Bronx is securely in first place with the most number of shelters per capita. This showcases an uneven distribution of shelters which presents an equity problem.
Hypothesis Revisited#
Per my hypothesis, I predicted that either Brooklyn or the Bronx would have the most shelters and individuals in the shelter system. While I was mostly right, Queens did have the most individuals and average individuals per shelter. The Bronx did, however, have the most shelters per capita.
Additionally, the Bronx’s community districts had an uneven distribution of shelters across the borough. This brings more questions that could lead to why there is such a huge discrepancy: (1) what are the poverty rates in the area?, (2) what is the rent/owner makeup of the area?, (3) what is the racial composition of the area?, et cetera. These are all areas for future study that could expand the understanding of how New York City is distributing the work of assisting its unhoused population, and they are important to know if the goal is equity in the distribution of resources.